COMS 4771 Spring 2015 Features
نویسنده
چکیده
In many applications, no linear classifier over the “raw” set of features will perfectly separate the data. One recourse is to find additional features that are predictive of the label. This is called feature engineering, and is often a substantial part of the job of a machine learning practitioner. In some applications, it is possible to “throw in the kitchen sink”—i.e., include all possible features that might possibly be relevant. For instance, in document classification, one can include a feature for each possible word in the vocabulary that indicates whether that word is present in the given document (or counts the number of occurrences). One can also include a feature for each possible pair of consecutive words (“bi-grams”), each possible triple of consecutive words (“tri-grams”), and so on. In general, it is common to automatically generate features based on existing features x ∈ Rd, such as quadratic interaction features x 7→ (x1x2, x1x3, . . . , x1xd, x2x3, . . . , xd−1xd) ∈ R( d 2), as well as higher-order interaction features. The main drawback of these “kitchen sink” feature expansions is that it may be computationally expensive to work explicitly in the expanded feature space. Fortunately, there are some ways around this.
منابع مشابه
COMS 4771 Spring 2015 Expectation - Maximization
Example 1 (Mixture of K Poisson distributions). The sample spaces are X = Z+ := {0, 1, 2, . . . } = and Y = [K] := {1, 2, . . . ,K}. The parameter space is Θ = ∆K−1 × R++, where ∆K−1 := {π = (π1, π2, . . . , πK) ∈ R+ : ∑K j=1 πj = 1}, R+ := {t ∈ R : t ≥ 0}, and R++ := {t ∈ R : t > 0}. Each distribution Pθ in P = {Pθ : θ ∈ Θ} is as follows. If (X,Y ) ∼ Pθ for θ = (π, λ1, λ2, . . . , λK), then Y ...
متن کاملCOMS 4771 : Homework 2 Solution
f(x) = { 1 if wx+ b > 0, −1 otherwise. Consider d+1 points x = (0, ..., 0) , x = (1, 0, ..., 0) , x = (0, 1, ..., 0) , ..., x = (0, 0, ..., 1) . After these d+1 points being arbitrarily labeled: y = (y0, y1, ..., yd) T ∈ {−1, 1}. Let b = 0.5 · y0 and w = (w1, w2, ...wd) where wi = yi, i ∈ {1, 2, ..., d}. Thus f(x) can label all these d+ 1 points correctly. So the VC dimension of perceptron is a...
متن کاملI-COMS: Interprotein-COrrelated Mutations Server
Interprotein contact prediction using multiple sequence alignments (MSAs) is a useful approach to help detect protein-protein interfaces. Different computational methods have been developed in recent years as an approximation to solve this problem. However, as there are discrepancies in the results provided by them, there is still no consensus on which is the best performing methodology. To add...
متن کاملCOMS 6998 : Advanced Complexity Spring 2017 Lecture 2 : January 26 , 2017
1. The Andre'ev (1987) bound: explicit function with size f = ˜ Ω(n 5 2) 2. The relationship between formula size and circuit depth, and how to accomplish " depth reduction " for a general circuit 3. Monotone formulas for monotone functions (example: monotone formula for the majority function MAJ)
متن کاملNew economy: from crisis of dot-coms to virtual business
Article about objective regularity and mechanisms of formation of network economy in the conditions of crisis of «new economy». The network economy is considered as a basis of a following business cycle. The author analyzes features of institutional transformation of economic relations under the influence of development of cloudy technologies and the virtual organizations.
متن کامل